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Abstract — The loop series provides a formal way to write 
down corrections to the Bethe entropy (and/or free energy) of 
graphical models. We provide methods to rigorously control such 
expansions for low-density parity-check codes used over a highly 
noisy binary symmetric channel. We prove that in the asymptotic 
limit of large size, with high probability, the Bethe expression 
gives an exact formula for the entropy (per bit) of the input 
word conditioned on the output of the channel. Our methods 
also apply to more general models. 

I. Introduction 

Often one needs to compute the free energy and/or entropy 
of a graphical model. The Bethe approximation and the related 
Belief Propagation (BP) equations may sometimes offer a good 
starting point. However it is seldom a controlled approximation 
and even worse it is usually not clear if it yields upper or lower 
bounds, or even if there is any such relationship. There are not 
many results that precisely pinpoint the relation between the 
Bethe and true free energies or entropies. A general result of 
Vontobel [ 1 1 relates the Bethe free energy to an average of the 
true free energy over all graph covers. For Ising-like graphical 
models with attractive pair interactions, Wainwright [2| has 
shown that, under additional special conditions, the Bethe free 
energy is a bound to the true free energy. This work uses the 
same loop series used here. It is well known that the Bethe 
free energy is exact on trees, and it is natural to investigate 
its possible exactness on random Erdoes-Renyi type graphs 
which are known to be locally tree-like. But we already know 
of systems, such as random constraint satisfaction models 
(e.g, K-SXT or Q-coloring) or spin glasses, where the true 
free energy is not given by the Bethe formula - even when 
averaged over the graph ensemble. The local tree-like nature 
of the graph is not sufficient when long ranged correlations 
are present 0. 

For graphical models that describe communication with low 
density (parity-check and generator-matrix) codes over binary- 
symmetric memoryless channels the situation is favorable. 
Indeed we have plenty of evidence that the replica-symmetric 
solution^ is exact. See 0, lUD for bounds and Q, for 
results on the binary erasure channel. In ||9) it is proven that 
correlations between pairs of distant (with respect to Tanner 
graph distance) bits decay exponentially fast for LDGM codes 
in the regime of large noise, and LDPC codes in the regime 

'Replica-symmetric formulas are averaged forms of the Bethe formulas, 
where the average is over the channel output realizations and code ensemble. 



of small noise. This also allowed to conclude that the replica 
symmetric formulas are exact in these regimes. 

A few years ago Chertkov and Chernyak [ 1 1 developed 
a loop series representation for the free energy of graphical 
models. The virtue of this representation is that it isolates the 
Bethe contribution, and represents the remainder by a series of 
terms involving only BP messages associated to generalized 
loops of the graph. It is tempting to use this representation as 
a tool to compare the true and Bethe free energies. 

In this contribution we consider regular LDPC(7,r) codes 
used over a highly noisy BSC. Consider the conditional 
entropy -^H(X_\Y_) of the input word X_ = (X\ ■ ■ ■ X n ) given 
a channel output Y_ = (Y\- • - Y n ), We prove that in the 
large size limit, with high probability with respect to the code 
ensemble, the difference between the conditional entropy and 
the Bethe formula tends to zero. The error term essentially 
comes from the probability that the graph is not locally tree- 
like. Our techniques also allow to organize the dominant cor- 
rection terms into a polymer expansion^ involving generalized 
loops of size less than A n (0 < Ao < 1 a constant). As 
we will show, expander arguments imply that this polymer 
expansion converges uniformly in n. When the terms of the 
polymer expansion are added to the Bethe expression, with 
high probability, the difference with the conditional entropy 
becomes 0(e~ ne ) for some e > 0. 

Our results also apply to more general models. Namely 
the channel could have asymmetric flip probability. In fact 
the whole technique and results apply to spin-glass models 
on (I, r) Tanner graphs with I odd and I < r, with small 
magnetic fields, and any temperature. The limitation to I < r 
is not just technical. Indeed I > r would correspond to a 
kind of XORSAT constraint satisfaction problem, and for the 
usual XORSAT problem we know that the replica symmetric 
solutions are not generally exact at low temperatures. 

The case I = 2 (cycle codes) has its own special features 
and has been discussed in fl2l . 

II. Preliminaries 

We begin with a few definitions and notations. Fix two 
integers I < r. Consider two vertex sets: V a set of n variable 
nodes and C a set of m — check nodes. We think of n 
large and I, r fixed. We consider bipartite (/, r) regular graphs 

2 See 1111 for a pedagogical introduction to polymer expansions. 



- call them T - connecting V and C. The set of edges is E. 
More precisely, vertices of V have degree I, vertices of C have 
degree r, and there are no double edges. The set of all such 
graphs is denoted B(l,r,n). Note that V is the Tanner graph 
of a LDPC code with design rate 1 — l/r. When we say that T 
is random we mean that we draw it uniformly randomly from 
the set B(l,r,n). The corresponding expectation is Er/. 

Letters i,j will always denote nodes in V and letters a, b 
nodes in C. We reserve the notations di (resp. da) for the sets 
of neighbors of i (resp. a) in T. 

We will say that T is a (A, k) expander if for every subset 

V C V such that |V| < An we have \8V\ > nl |V|. Here 
dV is the number of check nodes that are connected to V. 
Take a random T. We can always find A > such that with 
probability 1 — 0(n.~^ 1-re ) -1 )), V is a (A, k) expander with 
K < 1 — j. It is sufficient to take < A < Ao where Ao is the 
positive solution of the equation^ 

l —r-h 2 (X ) - -h 2 (\onr) - X Q K,rh 2 ( — ) = . (1) 
/ r nr 

As will be seen later we need to take re g] 1 — 2( - r ; ~ 1 - > , 1 — j [ 
(which is always possible for r > 2). In the rest of the paper 
K is always a constant in this interval, and < A < Ao. For 
concreteness, one can take the case (I, r) — (3, 6), fix n = 1/2 
and Ao = 5 x 1(T 4 . 

Assume that we transmit (with uniform prior) code words 
from an LDPC code with Tanner graph T over a BSC with flip 
probability p. We assume without loss of generality that the all 
zero codeword is transmitted. Then the posterior probability 
that x = (xi)l l =1 € {0, 1}™ is the transmitted word given that 

V = (yi)7—i e 1}" i s received, reads 

Px\y (x\y) = - Y[l(®ied a Xi =0)Y[ exp((-l) Xi hi) . 



a£C 



iev 



(2) 

The graph T enters in this formula through the parity check 
constraints. In this formula hi = (— l) Vi ^ In and Z is the 
normalizing factor 

Z= n i (®^aX l =0)[]cxp((-ir/ l4 ). (3) 

xe{o,i}" aec iev 



We set 



2 p 



(4) 



It is good to keep in mind that the high noise regime consid- 
ered in this paper corresponds to small h (p close to 1/2) and 
that \hi\ = h. 

It is equivalent to describe the channel outputs in terms of 
y or in terms of the half-log-likelihood variables h = (/ii)™ =1 . 
Note that hi have the probability distribution c(hi) = (1 — 
p)5{hi — In -p^) + pSQii — hij^-). The expectation with 
respect to this distribution is called E^. We are interested in 

3 See e.g Q where the standard LDPC(Z, r,n) ensemble is considered. It 
is easily argued that the same result applies to B(l,r,n). 



the conditional entropy H(X_\Y_) of the input word given the 
output word. We have (see e.g, |[3)) 



h n ee -H (X\Y) = -E h [In Z] - \*LJL, 
n n ~ 2 p 



(5) 



In (0, n 1 In Z is the free energy of the Gibbs measure (O. 

III. The Bethe Approximation 

The Bethe free energy involves a set of messages 
{r]i-). a , fja-^i} attached to the edges of V. The collection of 
all messages is denoted (r),rf). These satisfy the BP equations 

= ^ + J2bedt\a Vb^i 

= tanh _1 (n jeaaVi tanh?7^ a ). 

These equations always have a trivial solution tanh77i^ a = 
tanb.J7 _>.j=l. We will consider only non-trivial solutions that 
are relevant for small h. For these solutions r\i^ a and rj a ^i 
take small values and we can show that |»7j_>a| < + {I — 
l)\h\ r - 1 + 0(\h\ r ) and \fj a ^\ < {h^ 1 + 0(\h\ r ). We call 
such solutions high-noise-solutions. 

These solutions have a Bethe free energy 




/Bethe (V,V) = -(X! 



F a 



F, 



(6) 



where 

F a = In |(1 +Uieda tanhl l^) + 2 r 2ieda ln 2 cosh Vi^a , 
Fi = In 2 cosh (^ + Eaedi ^0 > 
F ia = ln 2 cosh (r/^a + rj a ^i ) . 

Theorem 1: Suppose / is odd and 3 < / < r. There exists 
ho > (small) independent of n, such that for \h\ < ho and 
any high-noise-solution (77,77) of the BP equations, 



1 



E r [| - In Z- /ectho (v,v) I] = 0( 



1 



(7) 



The O(-) is uniform in the channel output realizations h. 

Remark 1: By Markov's bound we obtain that the differ- 
ence between the true and Bethe free energies tends to zero 
with high probability, in the n — > +00 limit. 

Remark 2: We can average equation (0 over the channel 
output and use (0 to relate the true and Bethe entropies. 

IV. Loop Corrections to the Bethe Approximation 

We define a generalized loop g as any subgraph contained 
in r with no dangling edges (figure [T]). Note that a generalized 
loop is not necessarily connected. We call di(g) (resp. d a (g)) 
the induced degree of node i (resp. a) in g. For a generalized 
loop we have di(g) £ {2, • • • , 1} and d a (g) £ {2, ■ • • . r}. 

For a finite size system, the loop series ifTOl is an identity 
valid for any solution of the BP equations. We have 



-kiZ- /Bothe (v, rj) = ~ ln{ V K (g) 
n n * — ' 



(8) 



The sum on the right hand side carries over all generalized 
loops included in T. The K (g) can be expressed entirely in 




9 = 7i U 72 = 7i n 72 

Fig. 1. Example of T € B (3, 4, 8). The generalized loop g has two disjoint 
connected parts 71 and 72. 

terms of BP messages ?7i_> and rj a ^i- The explicit formula 
is given in the appendix. Remarkably K(g) factorizes in a 
product of contributions associated to the connected parts of 
g. Each generalized loop can be decomposed in a unique way 
as a union g — Ukjk where 7^. are connected and disjoint 
generalized loops. The 7^'s are called polymers. We have 
K{g)=T[ k K{ lk ) and 

£*(*) = £^ £ n*fa) 

gcr Af>0 7i,...,7 M cr/c=l 

x n 1 fa n 7* = ) ■ w 

fe<fe' 

In the sum each 7^ runs over all polymers contained in T. The 
factor -Xr accounts for the fact that a polymer configuration 
has to be counted only once. Finally the indicator function 
ensures that the polymers do not intersect. Because of this 
constraint all sums in (O are finite. 

From a physical point of view (O is the partition function 
of polymers that can acquire any shape allowed by T, have 
activity^ K (7), and interact via a two body hard-core repul- 
sion. This analogy allows us to use methods from statistical 
mechanics to analyze the corrections to the Bethe free energy. 

We say that a polymer is small if I7I < An for some fixed A 
that we take in the interval [0,Ao]. The contribution of small 
polymers to © is 

1 M 

M>0 7i,...,7 fc s.t |7fc|<Anfc=l 

]J 1 fan 7*' = 0)- (10) 

k<k' 

Theorem 2: Suppose I is odd and 3 < I < r. take T at 
random. There exist a small ho independent of n such that 
for \h\ < ho, and any high-noise-solution (77, 77) of the BP 
equations, with probability 1 — iO(n~ < ^ 1 ~ K )~ 1 )), 

- In Z = /Bcthc fa + ~ In Z p + 0(e- m ) (11) 
n n 

for e > 0. Here O(-) is uniform h. 

The second term on the right hand side of ( fTTT i is the 
partition function of small polymers. One can compute in a 
systematic way the leading corrections to the Bethe free energy 

4 This is the name used by chemists to denote the probability weight 
assuming that the polymer would be isolated. Note that here K(-y) can be 
negative and this analogy is at best formal. 



M = 1 ® M = 2 (D (2) 

7i n 72 ^ 



M = 3 




71 n 72 7^ 71 n 72 j= 71 n 73 ^ 71 n 72 n 73 + 
71 n 73 ^ 72 n 73 + 72 n 73 ^ 

Fig. 2. All the Mayer graphs for M = 1, 2, 3. 

by expanding the logarithm in powers of the activities -ft" (7). 
This yields the so-called polymer ( or Mayer) expansion, 

1 -, +00 M 

E IT* fa) 

M>1 7i,...,7M s.t |7 fc |<Anfc=l 

x E II Hfan 7fe '^0))- (12) 

GcQm (k,k')eG 

The third sum is over the set Gm of all connected Mayer 
graphs G with M vertices labeled by 71,... ,7m ( see figure 
0. Note that in the expansion of the logarithm, the indicator 
function forces the polymers to overlap. Therefore the summa- 
tions contains an infinite number of terms and its convergence 
has to be controlled. 

Lemma 1: Suppose r > 2. Fix Co > 1 and replace if (7) 
by C,K (7) (C S C) in the polymer expansion ( flZb which then 
becomes a power series in the parameter \(\ < Co- Assume 
that r is a (A, n) expander with k s]1 — 2<r ; ~ 1 - > , 1 — j [. One 
can find ho > such that for \h\ < ho this power series is 
absolutely convergent uniformly in n and h. 

Remark 3: This lemma holds for any (/, r) with r > 2. 

Remark 4: Our real interest is of course for £ = 1, and 
the introduction of the parameter C above is just a convenient 
way to describe the nature of the polymer expansion. The 
lemma implies that one can compute the limit n — > +00 of 
the polymer expansion term by term (for small polymers), and 
that this limit is analytic for |C| < Co- This lemma forms a 
crucial part for the proofs of theorems [TJ and |2] 

Remark 5: The last term in the right hand side of ( fTTT ) 
contains the contributions of large polymers of size greater 
than An (in a sea of small polymers). It turns out that this 
contribution cannot be expanded into an absolutely convergent 
series, and has to be treated non-perturbatively by counting 
methods. 

Lemma [TJ has the following consequence: 
Corollary 1: Suppose r > 2. One can find ho > 
independent of n such that for \h\ < ho, 

±Er[UZ p ( R M=0(^- FI ) (13) 

V. Convergence of the Polymer Expansion (fT2l 

We give the main ideas of the proof of lemma [TJ 

Proof of Lemma [7} A standard criterion for uniform 



convergence and analyticity of the polymer expansion is [11] 

oo 

Q = £- sup ^ | 7 |*Co|A-(7)|<l. (14) 

If we prove that for polymers such that \j\ < Xn we have 

\K(>y)\<h*W, (15) 

then the result follows for h small enough. 

The main difficulty in proving ( fTBI l is that the (optimal) 
estimate ( 1341 ). ( f35l l in the Appendix shows that K(y) is not 
necessarily very small for graphs containing too many check 
nodes of maximal induced degree and too many variable nodes 
of even induced degree. More precisely for these bad graphs 
the activity is not exponentially small in the size of the graph. 
Then it is not possible to compensate for the "entropy" of the 
graph. 

We will use an expander argument to show that these bad 
cases do not occur when \y\ < Xn. We derive ( fT3T > with 

2 + r 

c = r -^—iTT V (16) 

3 — 1(1 — K) 

In the process of this derivation one has to require 3 — 1(1 — 
k) > and c > 0. This imposes the condition on the expansion 



lr 



Note that an expansion constant 



constant k > 1 

cannot be greater than 1 — 1 /I, so it is fortunate that we have 



> 1 



2(r-l) 
lr 



(for any r > 2). 



Now we sketch the proof of (TT3T > and ( fT6l l. Recall that ^(7) 
(resp. da(j)) is the induced degree of node i (resp. a) in 7. The 
type of 7 is given by two vectors n — {n s (^)) l s=2 and m = 
("it( 7 ))t=2 defined as n s (7) := \{i G 7 PI V|dj( 7 ) = s}\ and 
m * (7) : — |{o G 7 H C|d ( 7 ) = In words, 713(7) and 
mt(j) count the number of variable and check nodes with 
induced degrees s and t in 7. Note that we have the constraints 



M = E s=2 ^(7) + E*=2 m «(7) 

El=2 s ^(7) = Et=2 tm «(7) 



(17) 



We apply the expander property to the set V = {i G 7 n V} . 
This reads 



lavi > kI 



J2 n s (7) 



s=2 



(18) 



On the other hand \dV\ < J2l=2 TO * (7) + 53 s =2 — s ) n * (7)- 



With ([T81 this yields the constraint 



r-l 



53 TO t (7) + - s)n s (7) > nl 



t=2 



s=2 



51 n * (7) 



s=2 



Using all constraints (fTTT i and (fT9b we can prove 

r-l 



53 (r - *) m « (7) > 



3-Z(1-k) 



h/l 



(19) 



(20) 



Finally, keeping only the product over t = 2, ■ ■ ■ ,r — lin 
estimates ( [34-b and ( f35T > in the Appendix, we obtain ( fTBT ). ■ 
Proof of Corollary Q} Conditional on T being an ex- 
pander we have from the previous proof < Q « 1. Then, 



polymer expansion techniques ifTTI allow to estimate the sum 
over M in ( flZb term by term, which yields 

l^lnZ^r?)! ^(l-Q)- 1 ™- 1 53 53 |^( 7 )|e^l. 

zeVUC73z,| 7 |<An 

(21) 

If we take the expectation over graphs we cancel the sum 
over z G V U C and the n^ 1 . This allows to consider 
a sum of polymers rooted at one vertex. We compute this 
expectation by conditioning on the first event that T is tree- 
like in a neighborhood of size O(lnn) around this vertex, 
and on the second complementary event. The second event 
has small probability 0(n~^^) for any < (3 < 1. 
Besides from (f2Tb and ( fTBI l it is easy to show that n^ 1 1 In Z p \ 
is bounded. For the first event we have that the smallest 
polymer is a cycle with | 7 | = 0(ln?i). This with d2TT > and 
( fTBI l implies that n~ 1 \ \nZ p (rj, rj)\ < n~^ in<[h ^. Combining 
all these remarks with the fact that T is an expander with 
probability 1 - C^nT^ 1- " 1 ) -1 )) we obtain (fT3]l. ■ 

VI. Probability estimates on graphs 

In this section we deal with the contribution R(rj,rj) cor- 
responding to terms containing at least one large polymer in 
©. We have 



E 



K(g) = Z p (r),rj) + R(r),rj), 



where 



R(r,,rj) 



E 

gCV s.t 37C3 with |7|>An 



K(g), 



(22) 



(23) 



The next lemma shows that the contribution from large 
polymers is exponentially small, with high probability with 
respect to the graph ensemble. 

Lemma 2: Fix 8 > 0. Assume / > 3 odd and / < r. There 
exists a constant C > depending only on / and r such that 
for h small enough 



p[|i%77)l >a] 



1 



■Cn 



(24) 



Sketch of Proof: Let fir (n, m) be the set of all g C T 
with prescribed type (n(g),m(g)). By ((3BJ and the Markov 
bound 

p 53 ia-g/)i > 5 

gCT with |g|>An 

<] E K(n,m)E r [\n r (n,m)\}, (25) 

Notice that the probability in d25t is an upper bound on the 
probability in <t24b . In d25t we have 

{l r I r 

(n,m) \Xn < 53 n ^ + E mt 'E sri;i = E im *' 
s=2 t=2 s=2 t=2 

I r . 

53 n s < n, 53 TO t < w^/r > . 

o=9 / = 9 > 



(26) 



The expectation of the number of g C T with prescribed type 
can be estimated by combinatorial bounds provided by McKay 
iTPJl . It turns out that these subgraphs proliferate exponentially 
in n only for a subdomain of A where K (n, m) is exponen- 
tially smaller in n. In the subdomain where K (n, m) is not 
small (but it is always bounded) the number of subgraphs is 
subexponential when I is odd and I < r. As a consequence 
for I odd and I < r, we are able to prove that the sum on the 
right hand side of (l25l l is smaller than e~ Cn . Unfortunately 
our estimates break down for I even. ■ 

VII. Sketch of Proof of Theorems[T]and|2] 
We write 



R(r),rj) 



^n{E^)} = iln % ^) + iln(l, ^ 



(27) 

We first look at the second contribution coming from large 
polymers. From corollary Q] and the Markov bound, we have 
for any e > 0, 



< 



- / _ < e n£ ] = 1 - -0{n- {l{1 - K ^) (28) 



Using inequalities d24l i and (l28l . and choosing 5 = e 2ne it 
is not difficult to show that (at this point one takes 2e < C) 







< 









1 



)+e 



i(C-2e) 



(29) 



This allows to conclude that with probability 1 

i ( n -(J(l-«)-i)) 



1 



In 1 



R{r),rj) 



= 0{e~ ne ). 



(30) 



This already proves theorem |2] 

It is now easy to show theorem [T] There is a probability 
0(n~^ 1-K ' )-1 - ) ) that this last term is not small. However we 
can always show it is bounded by a constant independent of n. 
Indeed it is equal to the difference n^ 1 \nZ — /Bethe(??> f?) — 
n^ 1 In Z p (i],r}) where each term separately can be shown to 
be bounded by a constant independent of n. Furthermore, 
corollary [T] tells us that the expectation of the absolute value 
of the first term on the r.h.s is 0(n~V( 1 ~ K ' 1 >). Combining 
these remarks allows to conclude the proof of theorem Q] 



We have 



VIII. Appendix 
K(g)= [] K n K - 

iGgnV aGgflC 



(31) 



Quantities K a ,Ki are local and can be computed only with 
BP messages. Let to, = tanh(/i, ; + J^aedi Va^i)- 

_ (1 - m t ) d ^)-i + + mi )^(g)-i 

2(l-mf) rf .(9)-i ( ' 



1 - tanh' r/^a 



Ka= II " 11 

tedang \ 1 ~ ^ljeda\i tanh Vj-^a iedang c 



Yl tanh r)i^. a 



1 + U l eda tanh 



n v 1 -^ 2 

(33) 



Using these formulas and the BP equations we derive the 
following estimate for \hi\ < ho small enough 



\K(g)\<K(n(g),m(g)) 



(34) 



where 

K(n(9),m(9)) = (l - a r rhT As) f[ (a t h^) mt{9) 



i-i 



8=2, 

even 



t=2 

s{g) i 

^ l[(i3 s (s-i)hr {9) . 

s=3, 
odd 

(35) 



Here < a r < 1, at > 1, 1 are fixed numerical 

constants (that we can take close to 1). Estimate (l35l l is 
essentially optimal for small h as can be checked by Taylor 
expanding K{g) in powers of hi. 
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